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1 Extended Multiplier 



1.1 Purpose UJ " 

The purpose of this extended multiplier, is to build more functionality into the existing 
multiplier, to optimize it for the expected customer applications. This will be accomplished, by- 
adding 2 additional multipliers per tile, and 4 additional adders. These will all be added to the 
current multiplier block, in essence creating a large functional unit that is capable of significant 
processing. 

In order to reduce the overhead in this effort, it has been decided to use the same input muxes 
and output muxes as the current multipliers. This means that we have a block that has 4 * 32-bit 
inputs and 2 * 32-bit outputs. Thus in order to take advantage of having 4 multipliers, the inputs 
somehow have to be shared between them, and the outputs, heed to be. shared as well. The 

O sharing of the inputs is generally accomplished by packing the inputs into the high and low 

halves of the input words. The output sharing is accomplished by either accumulating the results 

3= through the new adders, or packing the results into the output registers. 

■Q . . . _ 

•v_l Finally, for backward compatibility, we need to provide for the case where the new features are 
M= bypassed. This is accomplished by making it so that the input and output muxes are selected in 
O the same fashion for the old bits, and the new bits (when defaulted to 0) do not effect the circuit. 
■* The opcode is also designed so that the default case of all 0's causes the mutlipliers to behave as 
P before. 

■ H- 

-iq . ■ 

o 
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2 Implementation 

The XMULT or "extended multiplier" builds on the existing multiplier block by maintaining the 
existing I/O structure. It adds 2 additional multipliers and 4 adders to create a structure which 
can be configured as for a variety of different functions. A simplified diagram of the multiplier 
is illustrated below, along with a table of the target opcodes. The wiring pattern illustrates a 
backwards compatible mode. Components in bold illustrate the existing multiplier. Each adder or 
multiplier has 2 input muxes and an optional output register. 

inpO OmuxO 

multO 




mulG 




add2 



Omuxl 



addl " 



add2 



opcode 


latency 


function 


2MULT 


2 


current multiplier mode 


4ADD32 


3 


sum of 4 32-bit inputs 


4ADD16 


4 


sum of 4 sets of packed 1 6 bit inputs 


4MULT 


2 


4 independent multipliers with 1 6-bit packed inputs and outputs 


. 4MULTSUM 


4 


4 multipliers with 1 6-bit packed inputs and outputs summed together in tree 


4MULT2SUM 


3 


4 multipliers with 16-bit packed inputs and outputs summed together in tree 


4FIR 


N/A 


4 multipliers with 16-Bit packed coefficients, a single 16-Bit data input, a 
32-bit accumulation input and a 32-Bit outputs accumulated in cascade 
with pipeline registers between accumulators 


CMULT 


4 


16-Bit packed complex multiply with 32-Bit IQ accumulation input.output 


CMULT16 


3 


16-Bit packed complex multiply with FFT butterfly adders with complex 
input, output 
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OPERATOR MUX selects by OPCODE 



Ml IV 


n 


OPCODE 






2MULT 


4ADD32 


4ADD16 


4MULT 


4MULT+ 


4MULT2 
+ 


4FIR 


CMULT1 
6 


CMULT 


multO-a 

1 1 IUI lu O 




i0[31 :1o] 






i0[31:16] 


i0[31:16] 


i0[31:16] 


i2[15:0] 


i0[31:1&p 


i0[31:16] 






i0[1 5:0] 


















multO-b 

1 1 IUI IV In/ 




i1[31 :8] 






h[31:16] 


i1[31:16] 


i1 [3 1 : 1 6] 


i1 [31 :1 6] 


i 1 [3 1 : 1 6] 


i1[31:16] 






I A n A .A 

i1[31:16] 






















it [1 5:0] 


















multl -a 

III U 11 1 u 










i0[1 5:0] 


i0[1 5:0] 


i0[1 5:0] 


i2[1 5:0] 


i0[15:0l 


i0[1 5:0] 


multl -b 










ll [1 5.0J 


'.ATA c-m 

ll [ 1 O.UJ 


\A TA C-Ol 
11 [1 O.UJ 


iata cz-rrt 

mi o.uj 


i1 [1 5:0] 


i1[15:0] 


mult2-a 




IZ|G t .1 OJ 






l2[31 .1 o] 


;oro a • a ci 
l2[Ol .1 bj 


l^[o1 .IbJ 


o:U| 


ir\m a .a 

i0[31 :16] 


i0[31 :16] 






ion c.ni 

O.UJ 


















mult2-b 




jo ro a .oi 
lo[o 1 .OJ 






ion -i . -t el 
IJlOl .1 OJ 


lo[ol .1 OJ 


ion -1 ■ -1 ci 
IO[0 ( . 1 OJ 


\OfiA 'A CI 
l0[0 1 .1 OJ 


11 [i O.UJ 


IATA K-m 

n [i o.uj 






io[o I . 1 OJ 






















IO[ 1 O.UJ 


















mult3-a 










O.UJ 


O.UJ 


l^p O.UJ 


Izp O.UJ 


IU[1 O.UJ 


ir\rA cr .r\~\ 
lU[1 O.UJ 


mult3-b 










lo[1 O.UJ 


JO ta cr\i 
lo[l O.UJ 


IO[1 O.UJ 


lo[1 O.UJ 


ii [31 :1 6] 


n [31 :16] 


addO-a 


2 




IU|p1 .UJ 


•i"\ro a- r\i 

iU|o1 .UJ 




m0[31 :0] 


mU|ol :U| 


l0[31 .0] 


m0[31 :0J 


m0[31 :0] 


addO-b 


3 




i1 r°.i -m 

I I [O I .UJ 


1 I |o I .UJ 




m i[<j i .uj 


m i [o i .uj 


mini -m 
mi[oi .uj 


— m A -u A 
m 1+1 




add1-a 


3 




i2[31:0] 


i2[31:0] 




m2[31:0] 


m2[31 :0] 


a2[31:0] 


m2[31:0] 


m2[31:0] 


add1-b 


2 




i3[31:0] 


i3[31:0] 




m3[31:0] 


m3[31:0] 


m3[31:0] 


m3[31:0] 


m3[31:0] 


add2-a 


2 




a0[31:0] 


a0[31:0] 




a0[31:0] 




m0[31 :0] 




a0[31:0] 


add2-b 


3 




a1[31:0] 


a1[31:0] 




a1[31:0] 




a0[31:0] 




i2[31:0] 


add3-a 


3 






a2[31:16] 








a1[31:0] 


32'hO 


a1[31:0] 


add3-b 








a2[15:0] 








m2[31:0] 


i3[31:0] 


i3[31 :0] 


omuxO 


6 


m0[31:0] 


a2[31:0] 


a2[31:0] 


m0[31:16] 


a2[31:0] 


a0[31:0] 




a0[31:16] 


a2[31:0] 












m1[31:16] 








a1 [31:16] 




omuxl 


6 


m2[31:0] 


a2[Cout] 


a3[31:0] 


m2[31:16] 


a2[Cout] 


a1[31:0] 


a3[31:0] 


a3[31:0] 


a3[31:0] 












m3[31:16] 













i0-i3 = input mux 
m0-m3=mults 
a0-a3=adder 
-a = operand a 
-b= operand b 



Carry of 32-bit ADD operation is not brought out unless explicitly specified in this document. 
The precision of the ADD operation right after the multipliers is not lost due to the duplicate sign 
bit in the result of the multipliers. For any other additions, it the user's responsibility to avoid the 
event of a overflow. The user might also use the shift-down operation in the result of the adders 
to reduce the loss of precision. 
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2.1 Bit file encoding 

There are 10 bits in the CSMMULT that are not used. These are 'RESERVED' fields, as well as 
the mult_h Ism wen, and the mult_h Ism dynamic mode bits. The wen is not used, as }sm3 is 
never used as a write Ism unless it is connected to at least one other Ism. The write enable is 
routed with the write address, and the multiplier cannot generate the write address. The dynamic 
mode bits are routed with the write data, and the multiplier cannot generate write data. Thus, 
neither of these fields is meaningful in the CSMMULT. 

The multiplier a input mux select is named muxafghsel, which is currently a 1 bit field. We will 
extend this to 2 bits for both mutlipliers, at the cost of 2 CSM bits. The output select is named 
muxmultlsmsel, which is currently a 2 bit field. We will extend this to 3 bits for both 
mutlipliers, at the cost of 2 CSM bits. We will add a 5 bit opcode, which will essentially be 
shared, which will therefore cost 5 CSM bits. One of the remaining 2 bits will be used to 
selectively shift all of the multiplier results up by one bits, thereby normalizing off the redundant 
sign bit. The other remaining bit will be used to selectively shift the adder outputs down by one 
bit, in order to normalize the adder results. Thus, all of the available CSMMULT bits will be 
utilized by the new design. 



5=4 
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Opcodes (new 4 bit CSMMULT field) 



Name 


Bit 4 


Bit 3 


Bit 2 


Bit 1 


BitO 


Multbypass[3:0] 


Adder s16bit[3:0] 


Adder bypass[3:0l 


2MULT 


0 


0 


0 


0 


0 


x1x1 


Xxxx 


xxxx 


4ADD32 


0 


0 


1 


0 


0 


xxxx 


xOOO 


1100 


4ADD16 


0 


0 


1 


0 


1 


xxxx j 


1111 


1Q00 n 


4MULT 


0 


1 


0 


0 


0 


0000 


Xxxx 


xxxx 


4MULTSUM 


0 


1 


0 


0 


1 


0000 


xOOO 


x100 


4MULT2SUM 


0 


1 


0 


1 


0 


0000 


xxOO 


1111 


4FIR 


0 


1 


1 


0 


0 


0000 


0000 


0000 


CMULT 


0 


1 


1 


1 


0 


0000 


0000 


1100 


CMULT16 


0 


1 


1 


1 


1 


0000 


0x11 


0x11 



Input Muxes (2 new CSMMULT bits) 



Mux / Select 


0 


1 


2 


3 


multO-a 


ion 5:01 


i0[31:16] 


i2f 15:01 


16'hO 


multO-b 


i1 [15:01 


M [31:1 61 


i1 [31:81 


24'hO 


mult1-a (multO-a) 


i0[31:16l 


iOf 15:01 


i2[15:0l 


16'hO 


mult1-b (multO-b) 


M [31:1 61 


i1 [15:01 


iir31:16l 


16'hO j 


mult2-a 


16'hO 


i2[31:16l 


i0[31:16l 


i2[15:0l 


mult2-b 


16'h0 


i3[31:16l 


i3r31 


i1 [15:01 


mult3-a (mult2-a) 


16'hO 


i2[15:0l 


i0[15:0l 


i2[1 5:01 . 


mult3-b (mult2-b) 


16'hO 


i3[15:0] 


16'hO 


i1[31:16l 



Note that the multl and mult3 a and b operand muxes are derived from the multO and mult2 a 
and b operand muxes respectively. For the 16 bit high low selects, it is useful to note that 
selecting the high part of the word for multO selects the low part of the same word for multO. 
This is true for the low bits of all of the select fields, but the high bits are reserved for the more 
special cases, such as 24*16 multiplies, the FIR opcode, and the CMULT opcode inputs. 



Mux input for adders (controls are c 


ecoded by opcodes) 


Mux / Select 


0 


1 


2 


3 


addO-a 


i0[31: 


01 


m0[31:0] 


32'hO 


desp0[31 :01 


addO-b 


i1[31: 


01 


ml [31:01 


-ml [31:0]+1 


32'hO 


add1-a 


i2[31: 


01 


m2r31:0l . 


a2[31:0] 


despl [31:01 


add1-b 


i3[31: 


161 


m3[31 :0] 


32'hO 


32'hO 


add2-a 


a0[31 


:01 


m0[31:0] 


32'hO 


desp2[31 :0] 


add2-b 


a1[31 


:01 


a0[31:0] 


i2[31:0] 


32'hO 


add3-a 


a2[31 


:161 


a1 [31:01 


32'hO 


desp3[31:0] 


add3-b 


a2f15 


:01 


m2[31:0] 


i3[31:0l 


32'hO 



Output Muxes (2 new CSMMULT bits) 



Mux / Select 


,0 


1 


2 


3 


4 


5 


6 


7 


omuxO.(mult-h) 


m0[31:0l 


Ismval 


i0[31:0] 


h[31:0] 


a2f31:0l 


{m0,m1} 


a0[31:0l 


{a0,a1} 


omuxl (mult-l) 


m2f31:0l 


Ismval 


i2[31:0] 


i3[31:0] 


a2[cout] 


{m2,m3} 


a1 [31:01 


a3f31:0] 
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Mult Shift Up (1 new CSMMULT bit) 

Selectively causes the output of the multipliers to be shifted up by one bit for normalization. The 
LSB is then connected to Gnd. 

Adder Shift Down (1 new CSMMULT bit) 

Selectively causes the output of the adders to be shifted down by one bit for normalization. 

Input Mux (1 new population on the interconnect input mux) 

The current input mux supports the constants 0 and -1 . It is proposed to add the constant 
0x00010001 to allow selective multiplication by 0, 1 and —1. 



2.2 Layout Floorplan 



InputmuxO 



Inputmuxl 



Multmux2 



Multmux3 



Mult2. 



Mult3 



MultmuxO 



Multmuxl 



MultO 



Multl 



AdderO 



Adder 1 



Adder2 



Adder3 



OutputmuxO 



Outputmuxl 
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2MULT - CS2112 Compatible mode 2 independent multipliers 



a 
b 

c 
d 





oO 



ol 



P 



inpO 



j "*"| multO 




multl 



^1 




mult3 



addO 




+ 



addl 

33 



add2 
- — 




O mi ix() 
9*S 



u 



Omuxl 



n 



add3 
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a 
b 

c 
d 



+ 




+ 



+ 



• ' 

. 





+ 



+ 
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inpO 









multO 










1 





inpl 



inp2 



inp3 



OmuxO 









mult2 ■ 










1 
















mult3 


1 

























multl 


1 






J®' 
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4ADD16 - Sum of 4 packed 16-bit inputs, sum of upper, lower 16-bits 



a 
b 

c 
d 




-f- o0 



H 




inpO 









multO 








b© 


1 





OmuxO 



inpl 



inp2 



inp3 









multl 


1 
























mult2 










1 











mult3 



addO 




+ 



addl 



add2 




H 



Omuxl 



add3 

10 
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4MULT - 4 multipliers with pack 16-bit inputs 



H 



H 




* 



77 * 




H 



H 




* H- 




* 



"oOH 



oOL 



-olH 



-olL 



inpO 



inpl 



inp2 



inp3 



H 



H 



H 



H 



multO 




multl 

D3> 



mult2 



mult3 




OmuxO 



mult2r3 1:161 



mult2f3 1:161 



+ 



OregO 



OmuxO 



mult2[31:16] 



mult3[31:16] 



+ 



Oregl 



Chameleon Systems Confidential 



Page 1 1 



Chameleon 

I T I I E > I . III. 



Y4 



i ont 2X Multiplier Specificatio^J^ 



Document 
Control No. 



01-002 



Revision 1.0 



4MULTSUM - Sum of 4 multipliers 
i H 



a 
b 



1 
















-it 




inpO 



OmuxO 



inpl 



inp2 



inp3 



H 



H 



H 



H 
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a 
b 



c 
d 




* 



H 




L * 



L 

Hi * 



H 





kl * H- 




+ 




00, 



Ol 



<q 



yy 



inpO 



inpl 



inp2 



inp3 



iH 



H 



multO 



H 



H 



OmuxO 
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cH 



W - 




OmuxO 
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CMULT16 - Complex Multiplier with 16-Bit Packed data, and indepent delay path. 
Assumes real part in High 16-bits, imaginary in Low 16-bits 



a 
b 



H 



H 




77 * 




H 




a*H- 




sJ * H- 





H 



L 



oo 



»16 



Ol 



inpO 



OmuxO 
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CMULT — 32-bit output complex multiply with 32-Bit accumulation input, Assumes real 
part in High 16-bits, imaginary in Low 16-bits 



a 
b 



+ 



17 * 




H 




ij * H- 




Hi * H- 






+ Y— I— oO 




OmuxO 
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16-bit output of each 'real' despreader tree is packed with the corresponding 'imaginary' despreader 
output into one 32-bit output, such as output of TreeO is packed with output of Treel and Tree 2's is 
packed with Tree3's. 

The final add before the output mux is performed inside the 2x multiplier in 4ADD 1 6(packed 1 6-bit 
addition) mode. 

A add-one signal decoded inside the despreader is used to determine the other operand of the final add. 
The operand could either be zero or 2 packed 16-bit '0001 '. 
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o 



decode 




16-bit img 



0001,0001- 
0001,0000- 
0000,0001- 
0000,0000- 



2 16-bit packed 
* 



Packed 16-bit 
add in 2x mult 



decode 




2 16-bit packed 




Packed 16-bit 
add in 2x mult 



32-bit chain output 



32-bit chain output is added with all zero in the 2x mult before being sent to output mux 1 . 
2 32-bit packed outputs CO and CI are added together before being sent to output mux 0. 
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A 5-bit opcode is used in the enhanced multiplier (both 2xmult and desp/corr) for decoding 9 modes in 
2xmult and 12 mode desp/corr as shown in the following table. 



Mode 


Bit[4] 


Bit[3] 


Bit[2] 


Bit[l] 


Bit[0] 


4xdesp8 complex 


1 




1 


0 


0 


4xdesp8 comp-conjug 


1 


1 


1 


0 


1 


4xdesp8 zero 


1 


1 


1 


1 


0 


4xdesp8 real 


1 


1 


1 


1 


1 


8xdesp8 complex 


1 


1 


0 


0 


0 


8xdesp8 comp-conjug 


1 


1 


0 


0 


1 


8xdesp8 zero 


1 


1 


0 


1 


0 


8xdesp8 real 


1 


1 


0 


1 


1 


Corr complex 




0 


1 


0 


0 


Corr comp-conjug 




0 


1 


0 


1 


Corr zero 




0 


1 


1 


0 


Corr real 




0 


1 


1 


1 


2MULT 


0 


0 


0 


0 


0 


4ADD32 


0 


0 


1 


0 


0 


4ADD16 


0 


0 


1 


0 


1 


4MULT 


0 




0 


0 


0 


4MULTSUM 


0 




0 


0 


1 


4MULT2SUM 


0 




0 


1 


0 


4FIR 


0 




1 


0 


0 


CMULT 


0 




1 


1 


0 


CMULT16 


0 




1 


1 


1 



There is an output register in each of the desp/corr tree. 

The pn code will be registered after the 2-to-l input scrambling mux. 

All desp/corr trees output would go to an adder in the 2xmult before going to the output mux. 
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